Impact of COVID-19 on Canada/Alberta from Oil Price perspective

Presented by:

Orlando Morales - 30190412

Soma Dipti - 30185455

Swarnim Khosla - 30188716

Introduction

Economy is the area of study for our research. We want to learn how Canada and Alberta are faring because we have recently emerged from one of the most devastating global health crises. Everyone has been affected by the COVID-19 pandemic losses in some manner over the past two and a half years. GDP and CPI are two of the most crucial factors in determining a nation's and its citizens' level of economicĀ health. Oil and gas become a pertinent factor for our study of Canada and Alberta because it is one of the greatest contributors to Canada's GDP. Alberta produces the majority of Canada's oil, therefore research into the oil and gas business is done to better understand how the province's economy is affected and recovering.

The importance of this issue is to have a broad perspective on the nation's current state, as this will immediately assist a person in making wise decisions. Our datasets were split into four groups based on market factors that could provide insight into the state of the economy.

Four datasets are included in the first category, which is used to examine the Canadian economy and pinpoint any key variables influencing change. The first one includes the GDP and growth rate for each nation, while the other two include population data and globe map coordinates. We compute the GDP per capita using these numbers. The daily Oil Price Index across the years is provided by the fourth dataset. These figures serve as a comparison between the current rate of growth and that of the previous year. It willĀ also assist in determining whether there is any relationship, if any, between the daily global oil price index and GDP per capita.

We used a single dataset containing consumer price index across industries throughout the years to determine the rate at which the price values for products and services have changed over time and the impact of covid on it. This will help us analyse the impact of pandemic. The highest contributor's behaviour is observed, and this behaviour is compared to the recovery to draw conclusions.

We have examined the money produced by the industry that contributes the most in order to comprehend the economic recovery following the epidemic.Ā This was accomplished using seven datasets that included company-specific sales information from 2016 to 2021. These statistics were important because they let us to calculate annual gross and net revenue as well as royalties paid to the government in the form of taxes, which allowed us to predict the economic recovery.

The first two datasets in our last category are historical lending loan and prime rate of interest as determined by the government of Alberta. The next two had theĀ average weekly earnings by industry and the unemployment rate for young adults and those over the age of 15. With all these data set combined we can infer Alberta and Canada's overall economic health , thus helping one make informed financial data.

Guiding Questions:

  1. How does Canada’s GDP was affected by COVID?

  2. Which Alberta’s sector was the most impacted during COVID?

  3. How has inflation impacted different aspects of people’s life?

  4. Conclusion: General situation

These inquiries are crucial for tracking the pandemic's effects as well as the province's present economic state and Canada's general situation. These details will give insight into the financial situation and aid in decision-making.

Files available at:

https://drive.google.com/drive/folders/1Yj5482TAmhggvqYr6_NgouR2Vbdw4Pf-?usp=sharing

In [1]:
#Import all libraries to be used:

#data frames
import numpy as np
import pandas as pd
import geopandas as gpd

#datetime
import datetime as dt
import calendar

#statistical
from scipy.stats import pearsonr

#visualizations
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib import rcParams

import seaborn as sns

from plotly.subplots import make_subplots

import plotly.express as px 
import plotly.graph_objects as go

# Display all cell outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

#Install openpyxl
import sys
!{sys.executable} -m pip install openpyxl
Requirement already satisfied: openpyxl in /opt/conda/lib/python3.10/site-packages (3.0.10)
Requirement already satisfied: et-xmlfile in /opt/conda/lib/python3.10/site-packages (from openpyxl) (1.1.0)

Guiding question 1: How does Canada’s GDP was affected by COVID?

  • Canada’s GDP by year – Timeseries
  • O&G price performance - Timeseries
  • Correlation between Oil Price and GDP – Scatter plot

Data set used:

  • GDP by country
  • Coordinates Map
  • Population by country

Libraries used for visualizations:

  • Pyplot
  • Geopandas

Data Cleaning:

  • Read CSV
  • Handle 0s and NaN
  • Transform astype
  • Melt
  • Rename columns
  • Merge

Datasets including GDP data and population data and coordinates for the map's presentation were combined on the date column to provide a response to the first guiding question. For both data sets, data cleaning procedures comprised column renaming, column insertion and calculation, formatting numbers, and transposing data frames.

In [2]:
#Read data for GDP and Map
rdata_gdp_country = pd.read_csv("./gdp_csv.csv")
rdata_gdp_country2022 = pd.read_csv("./gdp1_2022.csv") 

coordinates = pd.DataFrame(pd.read_html('https://developers.google.com/public-data/docs/canonical/countries_csv')[0])
In [3]:
rdata_gdp_country.head()
rdata_gdp_country2022.head()
Out[3]:
Country Name Country Code Year Value
0 Arab World ARB 1968 2.576068e+10
1 Arab World ARB 1969 2.843420e+10
2 Arab World ARB 1970 3.138550e+10
3 Arab World ARB 1971 3.642691e+10
4 Arab World ARB 1972 4.331606e+10
Out[3]:
Country Name Country Code Indicator Name Indicator Code 1960 1961 1962 1963 1964 1965 ... 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
0 Aruba ABW GDP (current US$) NY.GDP.MKTP.CD NaN NaN NaN NaN NaN NaN ... 2.615084e+09 2.727933e+09 2.791061e+09 2.963128e+09 2.983799e+09 3.092179e+09 3.202235e+09 3.310056e+09 2.496648e+09 NaN
1 Africa Eastern and Southern AFE GDP (current US$) NY.GDP.MKTP.CD 2.129059e+10 2.180847e+10 2.370702e+10 2.821004e+10 2.611879e+10 2.968217e+10 ... 9.730430e+11 9.839370e+11 1.003680e+12 9.242530e+11 8.823550e+11 1.020650e+12 9.910220e+11 9.975340e+11 9.216460e+11 1.082100e+12
2 Afghanistan AFG GDP (current US$) NY.GDP.MKTP.CD 5.377778e+08 5.488889e+08 5.466667e+08 7.511112e+08 8.000000e+08 1.006667e+09 ... 1.990732e+10 2.014640e+10 2.049713e+10 1.913421e+10 1.811656e+10 1.875347e+10 1.805323e+10 1.879945e+10 2.011614e+10 NaN
3 Africa Western and Central AFW GDP (current US$) NY.GDP.MKTP.CD 1.040414e+10 1.112789e+10 1.194319e+10 1.267633e+10 1.383837e+10 1.486223e+10 ... 7.275700e+11 8.207930e+11 8.649900e+11 7.607340e+11 6.905460e+11 6.837490e+11 7.416900e+11 7.945430e+11 7.844460e+11 8.358080e+11
4 Angola AGO GDP (current US$) NY.GDP.MKTP.CD NaN NaN NaN NaN NaN NaN ... 1.249980e+11 1.334020e+11 1.372440e+11 8.721929e+10 4.984049e+10 6.897276e+10 7.779294e+10 6.930910e+10 5.361907e+10 7.254699e+10

5 rows Ɨ 66 columns

In [4]:
#countries transformations
countries = rdata_gdp_country2022['Country Name'],rdata_gdp_country2022['2021'],rdata_gdp_country2022['Country Code']
countries = pd.DataFrame(countries).T
countries['2021'].astype('float')
countries.rename(columns={'Country Name':'name','Country Code':'iso_a3'},inplace=True)
for column in countries['2021']:
    countries['2021'] = countries['2021'].fillna(0)
countries = pd.merge(coordinates, countries, how='outer', on='name')

#World Map transformation
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world = pd.merge(world, countries, how='inner', on='iso_a3')
world['gdp_pc'] = world['2021']/world['pop_est']
Out[4]:
0               NaN
1      1.082100e+12
2               NaN
3      8.358080e+11
4      7.254699e+10
           ...     
261    9.007159e+09
262    2.106169e+10
263    4.199460e+11
264    2.120306e+10
265    2.621773e+10
Name: 2021, Length: 266, dtype: float64
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
In [5]:
#print data frames: Countries GDP and Coordinates
countries.head()
world.head()
Out[5]:
country latitude longitude name 2021 iso_a3
0 AD 42.546245 1.601554 Andorra 3.329911e+09 AND
1 AE 23.424076 53.847818 United Arab Emirates 0.000000e+00 ARE
2 AF 33.939110 67.709953 Afghanistan 0.000000e+00 AFG
3 AG 17.060816 -61.796428 Antigua and Barbuda 1.471126e+09 ATG
4 AI 18.220554 -63.068615 Anguilla NaN NaN
Out[5]:
pop_est continent name_x iso_a3 gdp_md_est geometry country latitude longitude name_y 2021 gdp_pc
0 889953.0 Oceania Fiji FJI 5496 MULTIPOLYGON (((180.00000 -16.06713, 180.00000... FJ -16.578193 179.414413 Fiji 4.592119e+09 5159.956436
1 58005463.0 Africa Tanzania TZA 63177 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... TZ -6.369028 34.888822 Tanzania 6.777510e+10 1168.426184
2 37589262.0 North America Canada CAN 1736425 MULTIPOLYGON (((-122.84000 49.00000, -122.9742... CA 56.130366 -106.346771 Canada 1.990760e+12 52960.869516
3 328239523.0 North America United States of America USA 21433226 MULTIPOLYGON (((-122.84000 49.00000, -120.0000... US 37.090240 -95.712891 United States 2.299610e+13 70058.900250
4 18513930.0 Asia Kazakhstan KAZ 181665 POLYGON ((87.35997 49.21498, 86.59878 48.54918... KZ 48.019573 66.923684 Kazakhstan 1.908140e+11 10306.509747
In [6]:
#plot parameters for World map
fig, ax = plt.subplots(1, 1)

orig_map=plt.cm.get_cmap('summer')
reversed_map = orig_map.reversed()

rcParams['figure.figsize'] = 12,8
wm = world.plot(column='gdp_pc',cmap = reversed_map, ax=ax, legend=True,
           legend_kwds={'label': "GDP per capita by Country 2021",'orientation': "horizontal"})

ax.set(title='World Map \n GDP per Capita')
plt.show()
Out[6]:
[Text(0.5, 1.0, 'World Map \n GDP per Capita')]

The GDP per capita is shown on a world map, and the greener it is, the higher the value. All countries are shown, however it is clear that Canada receives an average value of between $50,000 and $60,000 per person.

Data set comprising information about the entire world was utilized for below section, therefore data processing included renaming and cleaning up columns, removing, and filling in blanks and zeroes. For the visualizations, a filter in Country was applied to Canada.

In [7]:
#Data cleaning for GDP

#data set to work with
gdpset = rdata_gdp_country2022

#Eliminate blanks from column names
gdpset.rename(columns={
                'Country Name':'Country_Name',
                'Country Code':'Country_Code',
                'Indicator Name':'Indicator_Name',
                'Indicator Code':'Indicator_Code'},
              inplace=True)

#columns to drop
gdpset_to_drop = ['Indicator_Code','Indicator_Name','Country_Code',
                  '1960','1961','1962','1963','1964','1965','1966','1967','1968','1969',
                 '1970','1971','1972','1973','1974','1975','1976','1977','1978','1979',
                  '1980','1981','1982','1983','1984','1985','1986','1987','1988','1989',
                  '1990','1991','1992','1993','1994','1995','1996','1997','1998','1999',
                  '2000','2001','2002','2003','2004','2005','2006','2007','2008','2009'
                  ]

gdpset = gdpset.drop(gdpset_to_drop,axis=1)

#filling blanks and assiging NaN
for column in gdpset:
    gdpset[column] = gdpset[column].fillna(np.NaN)
    gdpset[column] = gdpset[column].replace(0,np.NaN)
In [8]:
#print data frame: GDP 
gdpset.head()
Out[8]:
Country_Name 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
0 Aruba 2.453631e+09 2.637989e+09 2.615084e+09 2.727933e+09 2.791061e+09 2.963128e+09 2.983799e+09 3.092179e+09 3.202235e+09 3.310056e+09 2.496648e+09 NaN
1 Africa Eastern and Southern 8.604780e+11 9.644180e+11 9.730430e+11 9.839370e+11 1.003680e+12 9.242530e+11 8.823550e+11 1.020650e+12 9.910220e+11 9.975340e+11 9.216460e+11 1.082100e+12
2 Afghanistan 1.585668e+10 1.780511e+10 1.990732e+10 2.014640e+10 2.049713e+10 1.913421e+10 1.811656e+10 1.875347e+10 1.805323e+10 1.879945e+10 2.011614e+10 NaN
3 Africa Western and Central 5.915960e+11 6.709830e+11 7.275700e+11 8.207930e+11 8.649900e+11 7.607340e+11 6.905460e+11 6.837490e+11 7.416900e+11 7.945430e+11 7.844460e+11 8.358080e+11
4 Angola 8.169956e+10 1.094370e+11 1.249980e+11 1.334020e+11 1.372440e+11 8.721929e+10 4.984049e+10 6.897276e+10 7.779294e+10 6.930910e+10 5.361907e+10 7.254699e+10
In [9]:
#Plot GDP by time, Canada vs average world 

#Filter by Canada
gdpcan = gdpset[(gdpset.Country_Name == 'Canada')]

#manipulating data to plot
gdpcan = gdpcan.melt(id_vars=['Country_Name'],var_name="Year")
gdpcan.rename(columns={'value':'GDP'},inplace=True)

#plot
gdpcan_plt = px.line(gdpcan, x='Year', y='GDP', height=500)
gdpcan_plt= gdpcan_plt.update_layout(title='Canada\'s Gross Domestic Product (GPD) by Year')
gdpcan_plt= gdpcan_plt.update_traces(line=dict(color='orange'))

gdpcan_plt= gdpcan_plt.add_vrect(x0="9", x1="11", annotation_text="COVID19", annotation_position="top right",
              fillcolor="green", opacity=0.25, line_width=0)

gdpcan_plt= gdpcan_plt.add_vrect(x0="4", x1="6", annotation_text="Recession", annotation_position="top right",
              fillcolor="green", opacity=0.25, line_width=0)
In [10]:
gdpcan_plt.show()

From the above time series showing GDP, two interesting periods are marked, the first one is the 2015 recession and the second one is the COVID-19 pandemic, both periods show a decrease in GDP but at the same time after those periods a recovery can be seen

The data set containing historical oil prices was read using skiprows and the date column was parsed for the following steps. For the visualisation,plotly express was used to oil prices.

In [11]:
#raw oil price
rdata_gdp_oilprice = pd.read_csv("./wti-crude-oil-prices-10-year-daily-chart.csv",skiprows=15,parse_dates=['date']) 

#manipulation of date
dseries = rdata_gdp_oilprice['date'].map(lambda x: x.day)
mseries = rdata_gdp_oilprice['date'].map(lambda x: x.month)
yseries = rdata_gdp_oilprice['date'].map(lambda x: x.year)

rdata_gdp_oilprice.insert(1,'Day',dseries)
rdata_gdp_oilprice.insert(2,'Month',mseries)
rdata_gdp_oilprice.insert(3,'Year',yseries)
In [12]:
#print data frame: GDP and Oil Price by month
rdata_gdp_oilprice.head()
Out[12]:
date Day Month Year value
0 2012-10-10 10 10 2012 91.25
1 2012-10-11 11 10 2012 92.07
2 2012-10-12 12 10 2012 91.86
3 2012-10-15 15 10 2012 91.85
4 2012-10-16 16 10 2012 92.09
In [13]:
#subset oilprice
oil_price = rdata_gdp_oilprice
oil_price.rename(columns={' value':'Oil_price','date':'Date'},inplace=True)
oil_price.dropna(subset = ["Oil_price"], inplace=True)

#plot Oil price
oil_plt = px.line(oil_price, x='Date', y='Oil_price', height=500,title="Time Series for Oil Price")
oil_plt=oil_plt.update_layout(xaxis=dict(rangeslider=dict(visible=True)))
In [14]:
oil_plt.show()

Oil prices through time are shown in the chart above, as for the last chart, it can be seen that the same time periods (2015 recession and COVID-19 pandemic) show a drop in oil prices.

In [15]:
#manipulate variables and create a new data frame for correlation
gdpcor = gdpcan
gdpcor['Year'] = gdpcor['Year'].astype(int)

op_dropc = ['Date','Day','Month']
opcor = oil_price.drop(op_dropc, axis=1)
opcor = opcor.groupby('Year', as_index=False).mean()

gdpop = pd.merge(gdpcor,opcor,how='inner',on='Year')
gdpop_pearson = pearsonr(gdpop['GDP'],gdpop['Oil_price'])
In [16]:
#print data frame: GDP and Oil Price
gdpop.head()
Out[16]:
Country_Name Year GDP Oil_price
0 Canada 2012 1.828370e+12 87.942140
1 Canada 2013 1.846600e+12 98.068722
2 Canada 2014 1.805750e+12 92.849937
3 Canada 2015 1.556510e+12 48.834810
4 Canada 2016 1.527990e+12 43.559508

In the below steps, the data set used was a little bit challenging as the date could not parsed directly. In the below steps date is formatted in date format and converted in date type, the columns were renamed and grouped by category.

A special aggregated by mean of the category was presented so that the visualization can be arranged in ascending order and the idea of the visualization is to see where the Energy field is placed for Alberta.

In [17]:
#read raw data
rdata_cpi = pd.DataFrame(pd.read_csv("./stc_18-10-0004-01_consumer_price_index_csv_v48.0_2022-09-20.csv"))


#date is imported as object and cannot change to data time in one step (not that we know)
new_date = rdata_cpi["Date"].str.split("-", n = 1, expand = True)

rdata_cpi.insert(1,'Month', new_date[0])
rdata_cpi.insert(2,'Year',new_date[1])
rdata_cpi['Month'] = pd.Series(rdata_cpi['Month'], dtype="string")
rdata_cpi['Year'] = pd.Series(rdata_cpi['Year'], dtype="string")

rdata_cpi = rdata_cpi.drop('Date',axis=1)
rdata_cpi.insert(0,'Date','01-' + new_date[0]+'-'+new_date[1])
rdata_cpi['Date']= pd.to_datetime(rdata_cpi['Date'])
In [18]:
#print data frame: Raw Data CPI
rdata_cpi.head()
Out[18]:
Date Month Year GEO Products_and_product_groups Value Percent_Change
0 2006-07-01 Jul 06 Alberta All-items 113.4 NaN
1 2006-07-01 Jul 06 Alberta Food 108.3 NaN
2 2006-07-01 Jul 06 Alberta Shelter 126.1 NaN
3 2006-07-01 Jul 06 Alberta Household operations, furnishings and equipment 102.6 NaN
4 2006-07-01 Jul 06 Alberta Clothing and footwear 96.2 NaN
In [19]:
#Manipulate cpi 
cpi_m = rdata_cpi[(rdata_cpi.Products_and_product_groups == 'Energy')]
cpi_m = rdata_cpi.groupby(['Year'],as_index=False).sum()
cpi_m['Year'] = pd.Series('20'+cpi_m['Year'])
cpi_m['Year'] = pd.Series(cpi_m['Year'], dtype='int')
cpi_m = cpi_m.loc[(cpi_m['Year'] > 2009) & (cpi_m['Year'] < 2022)]
cpi_m = cpi_m.drop('Percent_Change',axis=1)
cpi_m.rename(columns={'Value':'CPI_Energy'},inplace=True)

#Manipulate gdp
gdp_m = gdpset[(gdpset.Country_Name == 'Canada')]
gdp_m = gdp_m.melt(id_vars=['Country_Name'],var_name="Year")
gdp_m = gdp_m.drop('Country_Name',axis=1)
gdp_m['Year'] = pd.Series(gdp_m['Year'], dtype='int')
gdp_m.rename(columns={'value':'GDP'},inplace=True)

#Merge both dataframes
gdp_cpi = pd.merge(gdp_m, cpi_m, how='inner', on='Year')
In [20]:
#print data frame: GDP and CPI
gdp_cpi.head()
Out[20]:
Year GDP CPI_Energy
0 2010 1.617340e+12 28677.4
1 2011 1.793330e+12 29724.1
2 2012 1.828370e+12 30066.6
3 2013 1.846600e+12 30396.4
4 2014 1.805750e+12 31108.2
In [21]:
#Manipulate data 
cpiv = rdata_cpi[(rdata_cpi.GEO == 'Alberta')]
cpiv = cpiv.drop('Date',axis=1)
cpiv = cpiv.drop('Month',axis=1)
cpiv.rename(columns={'Value':'CPI'},inplace=True)

#replace values in column Products and product groups
cpiv["Products_and_product_groups"] = cpiv["Products_and_product_groups"].map(
    {"Household operations, furnishings and equipment":"Housing",
     "Recreation, education and reading":"Recreation",
     "Alcoholic beverages, tobacco products and recreational cannabis":"Alcohol",
     "All-items":"All-items",
     "Food":"Food",
     "Shelter":"Shelter",
     "Clothing and footwear":"Clothing and footwear",
     "Transportation":"Transportation",
     "Health and personal care":"Health and personal care",
     "Energy":"Energy"})

#Calculate mean of each group
cpiv = cpiv.groupby(['Products_and_product_groups','Year'],as_index=False).mean()

#order for the plot
orderset = cpiv.groupby(['Products_and_product_groups'],as_index=False).mean()
orderset = orderset.sort_values('CPI')
In [22]:
oilcor = oil_price
gdp_oil = pd.merge(gdp_m, oilcor, how='inner', on='Year')
gdpoildrop = ['Date','Day','Month']
gdp_oil = gdp_oil.drop(gdpoildrop,axis =1)
gdp_oil = gdp_oil.groupby('Year',as_index=False).mean()

gdp_oil
Out[22]:
Year GDP Oil_price
0 2012 1.828370e+12 87.942140
1 2013 1.846600e+12 98.068722
2 2014 1.805750e+12 92.849937
3 2015 1.556510e+12 48.834810
4 2016 1.527990e+12 43.559508
5 2017 1.649270e+12 50.886789
6 2018 1.725330e+12 64.881921
7 2019 1.742020e+12 57.054024
8 2020 1.645420e+12 39.683857
9 2021 1.990760e+12 68.169953
In [23]:
sns.lmplot(data=gdpop, x="GDP", y="Oil_price").set(title="Correlation GDP vs Oil price")
print(gdpop_pearson)
Out[23]:
<seaborn.axisgrid.FacetGrid at 0x7f0029e0b1f0>
PearsonRResult(statistic=0.7157074215797746, pvalue=0.019936932189352493)
In [24]:
#setting parameters for the chart
sns.set_style("dark")
rcParams['figure.figsize'] = 10,6

#visualization of first line in GDP
gdp_oil_p = sns.lineplot(data=gdp_oil.GDP, color="b",label="GDP").set(title='GDP and Oil Price')

#second axis of visualization
ax2 = plt.twinx()
gdp_oil_p = sns.lineplot(data=gdp_oil.Oil_price, color="r", ax=ax2,label="Oil Prices")
gdp_oil_p.get_yaxis().set_major_formatter(mpl.ticker.FuncFormatter(lambda x, p: format(int(x), ',')))
plt.legend(bbox_to_anchor=(0.95, 1), ncol=1)
gdp_oil_p.set_xticks(range(len(gdp_oil)), labels=range(2012, 2022))
Out[24]:
<matplotlib.legend.Legend at 0x7f0029d5a800>
Out[24]:
[<matplotlib.axis.XTick at 0x7f0029b891e0>,
 <matplotlib.axis.XTick at 0x7f0029b89600>,
 <matplotlib.axis.XTick at 0x7f0029ba7f40>,
 <matplotlib.axis.XTick at 0x7f0029bcd2a0>,
 <matplotlib.axis.XTick at 0x7f0029bcdb40>,
 <matplotlib.axis.XTick at 0x7f0029bce2c0>,
 <matplotlib.axis.XTick at 0x7f0029bcea10>,
 <matplotlib.axis.XTick at 0x7f0029ba7d00>,
 <matplotlib.axis.XTick at 0x7f0029e0a290>,
 <matplotlib.axis.XTick at 0x7f00701ef910>]

The purpose of the visualisations above is to introduce the next guiding question and begin the transition from the global scale to the local size where we will discuss Alberta. As we saw from both charts above the same pattern can be observed, so the next step was to get a scatter plot of GDP and Oil prices to see if any correlation exists, while it can be seen in the chart, the statistic also confirms it as it has a strong correlation of 0.712

Key takeaways:

  • Canada is in good health regarding GDP per capita
  • GDP and Oil prices for Canada are correlated
  • GDP and Oil prices seems to have a recovery after COVID-19

Guiding Question 2: Which Alberta’s sector was the most impacted during COVID?

  • Alberta's CPI by sector - Boxplot
  • Oil royalties - Time series

Libraries used for visualizations:

  • Pyplot
  • Seaborn

Data set used:

  • CPI
  • Oil royalties
  • Oil Price

Data Cleaning:

  • Group by
  • Skip rows
  • NaN and 0 values
In [25]:
#plot data
rcParams['figure.figsize'] = 12,6
cpivb=sns.boxplot(data=cpiv,x="Products_and_product_groups", y="CPI",order=orderset['Products_and_product_groups'])
cpivb.tick_params(axis='x', rotation=90)
plt.title('Box plot of Products and product groups')
plt.show()
Out[25]:
Text(0.5, 1.0, 'Box plot of Products and product groups')

The graph below shows the CPI for each industry in the post-pandemic era as box-plots. The CPI is primarily used to gauge the level of inflation that the general public is experiencing. The graph shows that the CPI for "energy" has increased the greatest. This indicates that the customer is the one who is most affected by inflation, particularly when it comes to paying for their energy bills.

The shelter industry has been significantly affected, ranking as the second hardest-hit sector where service costs have increased. People receiving shelter include those who pay rent to their landlords for housing. Additionally, it covers travelers who remain in transient "shelters" like hotels, hostels, etc.

In the post-pandemic era, we have also noticed a considerable increase in the cost of food and alcohol. Therefore, it stands to reason that Albertans would suddenly find it difficult to cover even their most basic expenses, such as energy bills, housing, and food.

In [26]:
# Oil Price of past 10 years.
df_oil_price = pd.read_csv("wti-crude-oil-prices-10-year-daily-chart.csv", skiprows=15) 

# Parsing the Date column into Date-time format type.
#Citation: - https://stackoverflow.com/a/39206377/10912105
df_oil_price['date'] = pd.to_datetime(df_oil_price['date'])
df_oil_price['date'] = pd.to_datetime(df_oil_price['date']).dt.year

# Ordering the DataFrame in descending order according to the "Date" column.
df_oil_price.sort_values(by = 'date', ascending = False, inplace = True)
df_oil_price.reset_index(drop=True, inplace=True)

# Renaming column names.
df_oil_price.rename(columns = {'date':'year', ' value':'oil price ($/bbl)'}, inplace = True)

# Removing all the rows which has 0 or NaN / NA Oil price.

df_oil_price['oil price ($/bbl)'] = df_oil_price['oil price ($/bbl)'].fillna(0)
df_oil_price = df_oil_price[df_oil_price['oil price ($/bbl)'] != 0]

# Citation: - https://stackoverflow.com/a/46827856/10912105
# Taking the mean of the oil price per barrel according to each year.
aggregation_functions = {'oil price ($/bbl)': 'mean'}
df_oil_price = df_oil_price.groupby(['year'], as_index=False).aggregate(aggregation_functions).reindex(columns=df_oil_price.columns)

# Ordering the DataFrame in descending order according to the "year" column.
df_oil_price.sort_values(by = 'year', ascending = False, inplace = True)
df_oil_price.reset_index(drop=True, inplace=True)

# Dropping all the values in the DataFrame where year < 2016.
df_oil_price.drop(df_oil_price[(df_oil_price['year'] < 2016)].index, inplace=True)
In [27]:
#print data frame: oil price royalties
df_oil_price.head()
Out[27]:
year oil price ($/bbl)
0 2022 98.010197
1 2021 68.169953
2 2020 39.683857
3 2019 57.054024
4 2018 64.881921

In the next section,The following Data Cleaning steps were taken: -

  1. Parsing the columns into relevant format type.
  2. Deleting irrelevant columns.
  3. Ordering the DataFrames according to desired row values.
  4. Removing all NaN and NA data values.
  5. Aggregation and Grouping of specific column's data values.

Data Sourcing and Data Cleansing consists of around 80% of the code. The rest 20% of the code is dedicated to plotting the graph.

For plotting the graph, Python's Plotly was used.

A line graph is made above depicting the changes in gross revenue and net revenue of Oil and Gas Industries with respect to time. The amount of Royalty paid by the Oil and Gas industry to the Albertan government is also shown.

The Line graph is divided into 3 time periods: -

  1. Pre Pandemic (2016-2019)
  2. COVID-19 Pandemic (2019-2020)
  3. Post Pandemic (2020-2021)
In [28]:
# Data Scourcing of all Excel Files.

# Oil Royalties data from 2016 - 2021.

df_oil_royalties_2021 = pd.DataFrame(pd.read_excel("energy-2021-project-data-for-open-data-as-20220509-12am.xlsx"))
df_oil_royalties_2020 = pd.DataFrame(pd.read_excel("energy-2020-project-data-for-open-data-as-20220509-12am.xlsx"))
df_oil_royalties_2019 = pd.DataFrame(pd.read_excel("energy-2019-project-data-for-open-data-as-20220509-12am.xlsx"))
df_oil_royalties_2018 = pd.DataFrame(pd.read_excel("energy-2018-project-data-for-open-data-as-20220509-12am.xlsx"))
df_oil_royalties_2017 = pd.DataFrame(pd.read_excel("energy-2017-project-data-for-open-data-as-20220509-12am.xlsx"))
df_oil_royalties_2016 = pd.DataFrame(pd.read_excel("energy-2016-project-data-for-open-data-as-20220509-12am.xlsx"))

#Combining all the Excel files into 1 DataFrame.

df_oil_royalties = pd.concat([df_oil_royalties_2021, df_oil_royalties_2020, df_oil_royalties_2019, df_oil_royalties_2018, df_oil_royalties_2017, df_oil_royalties_2016])

# Data Cleaning of all Excel Files.
df_oil_royalties = df_oil_royalties.filter(['Operator Name', 'Reporting Year', 'Gross Revenue ($)', 'Gross Revenue ($/bbl)', 'Net Revenue ($)', 'Royalty Payable ($)'])

# Converting values from Scientific notation to Regular Integer notation.
df_oil_royalties[['Gross Revenue ($)', 'Net Revenue ($)', 'Royalty Payable ($)']] = df_oil_royalties[['Gross Revenue ($)', 'Net Revenue ($)', 'Royalty Payable ($)']].astype('int64')

# Removing all the rows which has 0 or NaN / NA Gross Revenue.
df_oil_royalties['Gross Revenue ($)'] = df_oil_royalties['Gross Revenue ($)'].fillna(0)
df_oil_royalties = df_oil_royalties[df_oil_royalties['Gross Revenue ($)'] != 0]

# Citation: - https://stackoverflow.com/a/46827856/10912105
# Taking the sum of the Gross Revenue, Net Revenue, and Royalty Payable. Taking the mean of Gross Revenue ($/bbl). Grouping by Year and Operator Name.
aggregation_functions = {'Gross Revenue ($)': 'sum', 'Gross Revenue ($/bbl)': 'mean', 'Net Revenue ($)': 'sum', 'Royalty Payable ($)' : 'sum'}
df_oil_royalties = df_oil_royalties.groupby(['Operator Name','Reporting Year'], as_index=False).aggregate(aggregation_functions).reindex(columns=df_oil_royalties.columns)

# Ordering the DataFrame in descending order according to column name "Reporting Year".
df_oil_royalties.sort_values(by = 'Reporting Year', ascending = False, inplace = True)
df_oil_royalties.reset_index(drop=True, inplace=True)

# Taking subset of the main DataFrame df_oil_royalties in order to show Gross Revenue separately.
df_gross_year = df_oil_royalties[['Reporting Year', 'Gross Revenue ($)']]

# Summing up the Gross Revenue year wise.
aggregation_functions = {'Gross Revenue ($)': 'sum'}
df_gross_year = df_gross_year.groupby(['Reporting Year'], as_index=False).aggregate(aggregation_functions).reindex(columns = df_gross_year.columns)

# Taking subset of the main DataFrame df_oil_royalties in order to show Gross Revenue per Barrel of Oil.
df_gross_per_bbp_year = df_oil_royalties[['Reporting Year', 'Gross Revenue ($/bbl)']]

# Taking the mean of the Gross Revenue per Barrel, year wise.
aggregation_functions = {'Gross Revenue ($/bbl)': 'mean'}
df_gross_per_bbp_year = df_gross_per_bbp_year.groupby(['Reporting Year'], as_index=False).aggregate(aggregation_functions).reindex(columns = df_gross_per_bbp_year.columns)

# Taking subset of the main DataFrame df_oil_royalties in order to show Net Revenue separately.
df_net_year = df_oil_royalties[['Reporting Year', 'Net Revenue ($)']]

# Summing up the Net Revenue year wise.
aggregation_functions = {'Net Revenue ($)': 'sum'}
df_net_year = df_net_year.groupby(['Reporting Year'], as_index=False).aggregate(aggregation_functions).reindex(columns = df_net_year.columns)

# Taking subset of the main DataFrame df_oil_royalties in order to show the royalty paid to the government of Alberta.
df_royalty_year = df_oil_royalties[['Reporting Year', 'Royalty Payable ($)']]

# Summing up the Royalty to the government paid year wise.
aggregation_functions = {'Royalty Payable ($)': 'sum'}
df_royalty_year = df_royalty_year.groupby(['Reporting Year'], as_index=False).aggregate(aggregation_functions).reindex(columns = df_royalty_year.columns)
In [29]:
#print data frame: oil royalties by year
df_royalty_year.head()
Out[29]:
Reporting Year Royalty Payable ($)
0 2016 843040394
1 2017 2480251532
2 2018 2738473691
3 2019 4836819255
4 2020 1067122211
In [30]:
# Showing Gross Revenue, Net Revenue and Royalty payable together.
x = df_gross_year['Reporting Year']
y1 = df_gross_year['Gross Revenue ($)']
y2 = df_net_year['Net Revenue ($)']
y3 = df_royalty_year['Royalty Payable ($)']

# Create traces
oil_recovery = go.Figure()
oil_recovery= oil_recovery.add_trace(go.Scatter(x = x, y = y1,
                    mode='lines+markers',
                    name='Gross Revenue'))

oil_recovery=oil_recovery.add_trace(go.Scatter(x = x, y = y2,
                    mode='lines+markers',
                    name='Net Revenue'))

oil_recovery=oil_recovery.add_trace(go.Scatter(x = x, y = y3,
                    mode='lines+markers', name='Royalty'))

oil_recovery=oil_recovery.add_vrect(x0=2019, x1=2020, 
              annotation_text="COVID-19 Pandemic", annotation_position="top",
              fillcolor="green", opacity=0.25, line_width=1)

oil_recovery=oil_recovery.update_layout(
    title = "Gross Revenue, Net Revenue and Royalty with respect to Time",
    xaxis_title = "Year",
    yaxis_title = "($)",
    legend_title="Legend",
    font=dict(
        family="Courier New, monospace",
        size=12,
        color="Black"
    )
)
In [31]:
oil_recovery.show()

The above Oil Recovery graph was made with the data which was sourced/extracted from of 1 csv and 6 Excel files. The irrelevant data was deleted from the DataFrame and only the useful columns were kept.

  1. As we can observe from the graph, The gross revenue earned by Oil and Gas had steadily been increasing during pre pandemic times.
  2. During the COVID-19 Pandemic, there has been a sharp decrease in Gross Revenue earned by Oil and Gas. This is due to reduce demand for oil.
  3. There is a sharp growth/rebound in Gross Revenue earned by Oil and Gas industry due to resumed demand for oil and gas in the post pandemic phase.

The same can also be observed for Net revenue earned by Oil and Gas, and the Royalty which was paid to the Alberta's government for the following time period.

Key takeaways:

  • Energy is greatest expense
  • COVID-19 pandemic hit on revenues
  • High rebound at 2020-2021

Guiding Question 3: How has inflation impacted different aspects of people’s life?

  • Lending loan interest Rates – Timeseries Violin Plot
  • Prime interest rates - Heat Map
  • UnEmployment Statistics – Times Series comparision
  • Average Earnings of Albertans - Line Graph

Libraries used for visualizations:

  • Seaborn
  • Express
  • Graph_objects

Data set used:

  • Historical Lending Rates
  • Prime Interest Rates History
  • UnEmployment Data
  • Average weekly earnings by people in Alberta

Data Cleaning:

  • Read CSV
  • Handle 0s and NaN, parse alpha numeric strings
  • create functions to check types
  • Transform astype, date time, float
  • Rename columns
  • Merge columns and tables
  • Assigning categories
  • Aggregate data over categories
In [32]:
import re
def isDate(stringDate):
    try:
        
        parse(stringDate, fuzzy=False)
        return True

    except ValueError:
        return False
def isfloat(a_string):
    try:
        float(a_string)
        return True
    except ValueError:
        return False    


def getLROI(strInp):
    
    new_string = (strInp.lstrip()).rstrip()
    if(isfloat(new_string)):
        return round(pd.to_numeric(new_string),2)
    else:
        new_result = re.findall('[0-9]+', new_string)
        if (len(new_result)>2):
            numbers=pd.to_numeric(new_result)
            return round((numbers[0]-(numbers[1]/numbers[2])),2) 
    
plt.style.use('bmh')
from dateutil.parser import parse

The data for lending interest rates for local authorities is parsed in the stages below, and special characters and NA are handled. For the purpose of displaying the change in interest rates, the data is grouped and averaged over years. The aggregated comparisons are visualised using Plotly express.

In [33]:
#DataSet for lending interest rates for loan to local authorities 

##read lending rates from csv file , rename columns and handle special characters
lendingRates_rawData = pd.read_csv(r"tbf-ltla-historical-lending-rates.csv",
                 skiprows=8,na_filter=True,skip_blank_lines=True,skipinitialspace=True,on_bad_lines='skip',skipfooter=10,
                 false_values=['Date','3 years','5 years','10 years','15 years','20 years','25 years','30 years'],engine='python')



lendingRates_rawData.rename(columns = {'Unnamed: 0':'date', 'Unnamed: 1':'3year', 'Unnamed: 2':'5year', '2021':'10year', 'Unnamed: 4':'15year',
       'Unnamed: 5':'20year', 'Unnamed: 6':'25year', 'Unnamed: 7':'30year'}, inplace = True)

lendingRates_rawData=lendingRates_rawData.replace(
    to_replace='%', 
    value='',
    regex=True)

##lendingRates_rawData - drop data with invalid dates, check for valid date and convert to date type, filtering based on date column
lendingRates_rawData.drop(lendingRates_rawData[(lendingRates_rawData.date.isnull()) | (lendingRates_rawData.date=='Date ')].index,inplace=True)
lendingRates_rawData['isDate'] = lendingRates_rawData.apply(lambda row : isDate(row['date']), axis = 1)
lendingRates_rawData.drop(lendingRates_rawData[ (lendingRates_rawData.isDate==False)].index,inplace=True)
lendingRates_rawData['date']=(pd.to_datetime(lendingRates_rawData['date'])).dt.to_period('M')
lendingRates_rawData.drop(lendingRates_rawData[lendingRates_rawData['date']<'2010-01'].index,inplace=True)

##lendingRates - Refined interested columns, handle na for filter Data from lendingRates_rawData
lendingRates=lendingRates_rawData.filter(["date", "25year"])
lendingRates[['25year']]=lendingRates[['25year']].astype("string")

lendingRates=lendingRates.replace(
    to_replace="N/A ", 
    value='0',
    regex=True)

##lendingRates - get valid interest dates 
lendingRates['25year']=lendingRates_rawData['25year'].apply(lambda row : getLROI(row))
dfAvgMonthlyLendingRate=(lendingRates.sort_values('date').groupby(by=['date'],as_index=False)['25year'].mean())
lendingRates['year']=lendingRates['date'].dt.year
In [34]:
#print data frame: Lending Rates
lendingRates.head()
Out[34]:
date 25year year
0 2021-01 2.22 2021
1 2021-01 2.23 2021
2 2021-02 2.30 2021
3 2021-02 2.52 2021
4 2021-03 2.65 2021
In [35]:
##lendingRates - visualise using violin plot
lending_plot = px.violin(data_frame=lendingRates,x=lendingRates['year'], y=lendingRates['25year'],
                 box=True,
                 title='25 yr lending rate to local authorities',log_x=True
                 ,labels={"x":"Year","y":"Lending rates to local authority"}
                 #,color="year"
                )
In [36]:
lending_plot.show()

The loan interest rates for the capital provided to local authorities for administering the province are shown in the above graph. The graph displays the relationship between interest rates and GDP expansion. Lower interest rates are readily seen for the sluggish GDP growth years of 2015–2016 and 2019–2020. When the economy is recovering, interest rates rise as the gdp growth improves. The purpose of the higher interest rates is to stop the pressure on prices from getting any worse. Conversely, as growth slows, interest rates are reduced.

In [37]:
#DataSet for prime interest rates 

##primeRates;  prime interest rates from csv file , rename columns and convert to appropriate data types, handle na 

primeRates = pd.read_csv(r"Prime-Rate-History-wowa.csv",
                 na_filter=True,skip_blank_lines=True,skipinitialspace=True,on_bad_lines='skip')
primeRates.rename(columns = {'Date':'date', 'Prime Rate':'primeRatePer', 'Bank of Canada Overnight Rate':'BOIOvernightChngPer'}, inplace = True)
avgPrimeRate2022=primeRates[pd.to_datetime(primeRates['date'])=='2022-10-01']['primeRatePer'].mean()
df2 = pd.DataFrame({'date': ['2022-11-01','2022-12-01'],
                    'primeRatePer': [avgPrimeRate2022,avgPrimeRate2022],
                    'BOIOvernightChngPer': [3.25, 3.25]})

primeRates = primeRates.append(df2, ignore_index = True)

primeRates['date']=pd.to_datetime(primeRates['date']).dt.to_period('M')
primeRates.drop(primeRates[primeRates['date']<'2010-01'].index,inplace=True)
primeRates=primeRates.sort_values('date')
primeRates['year']=primeRates['date'].dt.year
primeRates['month']=primeRates['date'].dt.month
primeRates.dropna()
/tmp/ipykernel_618/3923255997.py:13: FutureWarning:

The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

Out[37]:
date primeRatePer BOIOvernightChngPer year month
626 2010-01 2.25 0.25 2010 1
627 2010-02 2.25 0.25 2010 2
628 2010-03 2.25 0.25 2010 3
629 2010-04 2.25 0.25 2010 4
630 2010-05 2.25 0.50 2010 5
... ... ... ... ... ...
777 2022-08 4.70 2.50 2022 8
778 2022-09 5.45 3.25 2022 9
779 2022-10 5.45 3.25 2022 10
780 2022-11 5.45 3.25 2022 11
781 2022-12 5.45 3.25 2022 12

156 rows Ɨ 5 columns

In [38]:
#print data frame: prime rates
primeRates.head()
Out[38]:
date primeRatePer BOIOvernightChngPer year month
626 2010-01 2.25 0.25 2010 1
627 2010-02 2.25 0.25 2010 2
628 2010-03 2.25 0.25 2010 3
629 2010-04 2.25 0.25 2010 4
630 2010-05 2.25 0.50 2010 5
In [39]:
#pivot prime rate data frame for visualising in  heat map

rates = primeRates.pivot("month", "year", "primeRatePer")
grid_kws = {"height_ratios": (.9, .05), "hspace": .3}
cmap = sns.cm.rocket_r
f, (ax, cbar_ax) = plt.subplots(2, gridspec_kw=grid_kws,figsize=(15, 10))
ax = sns.heatmap(rates, ax=ax,
                 cbar_ax=cbar_ax,
                 cbar_kws={"orientation": "horizontal"},annot=True,cmap=cmap)
ax.set_title('Heatmap of Interest rates')
Out[39]:
Text(0.5, 1.0, 'Heatmap of Interest rates')

The prime interest rates, are the rates at which people get loan are shown in the following graph. From the above graph we can clearly see the prime interest rates were quite stable between 2011- 2014, it went down to 2.7% during recession during 2015-2016. A similar trend is observed during pandemic, after a stable rates between 2017-2019, the rate of interest were reduced during pandic and then soared. The increased rates directly impacts the variable rate mortages and line of credits. With the soar in interest rates to 5.5%, the mortage rates has increased , thus affecting the housing market as sales have slipped. It also affects the spending power of people having mortages.

Unemployment The rate between young adults and those over 25 can be found by reading the csv file below. Data set columns are renamed and changed into the proper data types as part of the data cleaning process. Then NAĀ is handled, and data is filtered for Seaborn visualization.

In [40]:
#DataSet for un employment data

##UnEmployment Rate among between young adult and people over 25;  prime interest rates from csv file , rename columns and convert to appropriate data types, handle na 

unEmpRates = pd.read_csv(r"UnEmploymentData.csv",
                 na_filter=True,skip_blank_lines=True,skipinitialspace=True,on_bad_lines='skip')

unEmpRates[['Characteristic', 'NAICS',
       'AgeGroup', 'Sex']]=unEmpRates[['Characteristic', 'NAICS',
       'AgeGroup', 'Sex']].astype("string")
unEmpRates['When']=pd.to_datetime(unEmpRates['When']).dt.to_period('M')
unEmpRates.drop(unEmpRates[unEmpRates['When']<'2010-01'].index,inplace=True)
unEmpRates['Sex']=unEmpRates['Sex'].apply(lambda row : (row.lstrip()).rstrip())
unEmpRates['Year']=unEmpRates['When'].dt.year
filtered_unEmpRates = unEmpRates.loc[(unEmpRates['Sex'] == 'Both sexes' )&(unEmpRates['NAICS']=='All Industries')][['When','Year','NAICS','AgeGroup','Alberta']]
filtered_unEmpRates.dropna()
Out[40]:
When Year NAICS AgeGroup Alberta
29451 2010-01 2010 All Industries 15 to 24 years 11.1
29452 2010-01 2010 All Industries 15 years and over 6.7
29457 2010-02 2010 All Industries 15 to 24 years 10.8
29458 2010-02 2010 All Industries 15 years and over 6.8
29463 2010-03 2010 All Industries 15 to 24 years 12.8
... ... ... ... ... ...
30352 2022-07 2022 All Industries 15 years and over 4.8
30357 2022-08 2022 All Industries 15 to 24 years 10.6
30358 2022-08 2022 All Industries 15 years and over 5.4
30363 2022-09 2022 All Industries 15 to 24 years 9.9
30364 2022-09 2022 All Industries 15 years and over 5.5

306 rows Ɨ 5 columns

In [41]:
#print data frame: unemployment rates
filtered_unEmpRates.head()
Out[41]:
When Year NAICS AgeGroup Alberta
29451 2010-01 2010 All Industries 15 to 24 years 11.1
29452 2010-01 2010 All Industries 15 years and over 6.7
29457 2010-02 2010 All Industries 15 to 24 years 10.8
29458 2010-02 2010 All Industries 15 years and over 6.8
29463 2010-03 2010 All Industries 15 to 24 years 12.8
In [42]:
ueplot=(20, 10)
ueplot = px.violin(filtered_unEmpRates, y="Alberta", x="Year", color="AgeGroup",
                box=True, points="all",log_x=True,width=1000
                ,title='UnEmployment Rate in Alberta over the years among people'
                 ,labels={"Alberta":"Unemployment Rate in Alberta"}
               )
In [43]:
ueplot.show()

The graph above displays the trend in Alberta's unemployment rate for young adults aged 15 to 24 and for the general population. It was pretty clear that the unemployment rate for both categories had skyrocketed in 2019–2020. Since 2021, the unemployment rate has dropped for everyone following the epidemic phase. This can be explained by the increasing GDP growth rate, which leads to the creation of jobs and a decrease in unemployment.

In the next steps,Average weekly earning are read from the csv files. As part of data cleaning date is parsed, Industries are grouped and assigned categories. The data is further grouped and filtered for visualisation using plotly.

In [44]:
##avgEarnings data: read csv , skip na, change categorical values , handlespecial characters
avgEarnings=pd.read_csv(r"./average_weekly_earnings_monthly.csv",skiprows=0)  

avgEarnings[['Geography','Estimate','Industry','Average_Weekly_Earnings']]=avgEarnings[['Geography','Estimate','Industry','Average_Weekly_Earnings']].astype("string")
avgEarnings['Reference_Date']=(pd.to_datetime(avgEarnings['Reference_Date'])).dt.to_period('M')
avgEarnings['Average_Weekly_Earnings']=avgEarnings['Average_Weekly_Earnings'].str.replace("$","",regex='True')
avgEarnings['Average_Weekly_Earnings']=pd.to_numeric(avgEarnings['Average_Weekly_Earnings'])
avgEarnings['year']=avgEarnings['Reference_Date'].dt.year

avrEarningsAlberta=avgEarnings.loc[(avgEarnings['Geography']=='Alberta') & (avgEarnings['year']>=2010)][['Reference_Date','year','Geography','Industry','Average_Weekly_Earnings']]
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace("Accommodation and food services [72]","Hospitality"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Arts, entertainment and recreation [71]',"entertainment"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Construction [23]',"Construction"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Educational services [61]',"Education Service"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Health care and social assistance [62]',"Health and Service"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Finance and insurance [52]',"Finance"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Mining, quarrying, and oil and gas extraction [21]',"Oil and NaturalGas"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Transportation and warehousing [48-49]',"Transport"))

avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Trade [41-45N]',"Trade"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Retail trade [44-45]',"Trade"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Wholesale trade [41]',"Trade"))

avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Utilities [22]',"Misc"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Public administration [91]',"Administration"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Service producing industries [41-91N]',"Industrial Sector"))

avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Industrial aggregate excluding unclassified businesses [11-91N]',"Industrial Sector"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Management of companies and enterprises [55]',"Industrial Sector"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Goods producing industries [11-33N]',"Industrial Sector"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Information and cultural industries [51]',"Industrial Sector"))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Real estate and rental and leasing [53]','Rentals'))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Administrative and support, waste management and remediation services [56]','Administration'))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Durable goods [321N]' ,'goods'))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace(  'Forestry, logging and support [11N]'  ,'Forestry'))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Other services (except public administration) [81]','Misc'))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Professional, scientific and technical services [54]','Misc'))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Non-durable goods [311N]','Misc'))
avrEarningsAlberta['Industry']=avrEarningsAlberta['Industry'].apply(lambda row : row.replace('Manufacturing [31-33]',"goods"))

##avrEarningsAlbertaGrpd data: filter , aggreagte data over industries , year for alberta

avrEarningsAlbertaGrpd=avrEarningsAlberta.sort_values('year').groupby(by=["Geography","Industry","year"])["Average_Weekly_Earnings"].mean()
avrEarningsAlbertaGrpd=avrEarningsAlbertaGrpd.reset_index()

avgEarnings_data = avgEarnings.loc[ (avgEarnings['Geography']!="Canada")].sort_values('year').groupby(by=["Geography","Industry","year"])["Average_Weekly_Earnings"].mean()
avrEarningsAlberta["Industry"].unique()

avgEarnings_data = avgEarnings_data.reset_index()
Out[44]:
array(['Hospitality', 'Administration', 'entertainment', 'Construction',
       'goods', 'Education Service', 'Finance', 'Forestry',
       'Industrial Sector', 'Health and Service', 'Oil and NaturalGas',
       'Misc', 'Rentals', 'Trade', 'Transport'], dtype=object)
In [45]:
#print data frame: 
avgEarnings_data.head()
Out[45]:
Geography Industry year Average_Weekly_Earnings
0 Alberta Accommodation and food services [72] 2001 262.765833
1 Alberta Accommodation and food services [72] 2002 272.731667
2 Alberta Accommodation and food services [72] 2003 271.893333
3 Alberta Accommodation and food services [72] 2004 287.201667
4 Alberta Accommodation and food services [72] 2005 297.199167
In [46]:
##Visualise average earings over the years to understand the trend

industry_chart = px.line(avrEarningsAlbertaGrpd, x = "year", y = "Average_Weekly_Earnings",
              color = "Industry",
              color_discrete_map={
                'Oil and NaturalGas': "red",
                'Forestry': "green",
                'Finance': "blue",
                "Oceania": "goldenrod",
                'Forestry': "magenta",
                 'Hospitality':"darkviolet",
                 'Administration':"indianred", 
                 'entertainment':"mediumaquamarine", 
                 'Construction':"olivedrab",
                 'goods':"steelblue", 
                 'Education Service':"darkslategray", 
                 'Industrial Sector':"deeppink", 
                 'Health and Service':"black" ,
                 'Misc':"chocolate", 
                 'Rentals':"darkviolet", 
                 'Trade':"dodgerblue", 
                 'Transport':"green"}
              ,width=1000
              ,line_dash="Industry"
              ,labels={"Average_Weekly_Earnings":"Average Earnings grouped over years"}
              ,title= "Average earning categorised by industry over years in ALberta"
             
             )
In [47]:
industry_chart.show()

The graph up top displays the typical income earned by Albertans in various industries. We can deduce from the graph that Albertans' earnings have gradually increased since the outbreak.

Key takeaways

  • Unemploynment and average earning has improved
  • Lending interest and prime rates has not recovered

Guiding Question 4: General situation

Libraries used for visualizations:

  • Plotly express

Data set used:

  • GDP per capita

Data Cleaning:

  • date parsing
  • drop/rename columns
  • NaN values
In [48]:
rdata_gdppercapita = pd.read_csv("./canada-gdp-per-capita.csv",skiprows=16,parse_dates=['date'])

#manipulation of date
yseriesg = rdata_gdppercapita['date'].map(lambda x: x.year)
rdata_gdppercapita.insert(1,'Year',yseriesg)
In [49]:
#manipulating data frame
gdppc_todrop = [" ", "date"]
gdppc = rdata_gdppercapita.drop(gdppc_todrop,axis=1)
gdppc.rename(columns={' GDP Per Capita (US $)':'GDP_per_capita',' Annual Growth Rate (%)':'Annual_Growth_Rate'},inplace=True)
In [50]:
#plot
gdppc["Color"] = np.where(gdppc["Annual_Growth_Rate"]<0, 'red', 'green')

gdppcp = make_subplots(specs=[[{"secondary_y": True}]])

gdppcp=gdppcp.add_trace(go.Bar(x=gdppc['Year'],y=gdppc['Annual_Growth_Rate'],
                 marker_color=gdppc['Color'],name='Growth Rate'),
                 secondary_y=False)

gdppcp=gdppcp.add_trace(go.Scatter(x=gdppc['Year'],y=gdppc['GDP_per_capita'],
                            name="GDP Per Capita"),secondary_y=True,)

gdppcp=gdppcp.update_yaxes(range=[-40000, 55000],secondary_y=True)
gdppcp=gdppcp.update_yaxes(title="GDP per capita ($US)",secondary_y=True)
gdppcp=gdppcp.update_yaxes(title="Growth Rate",secondary_y=False)


gdppcp=gdppcp.update_layout(dragmode="zoom",hovermode="x",legend=dict(traceorder="reversed"),height=600,
    template="simple_white",margin=dict(t=100,b=100),)

gdppcp=gdppcp.update_layout(title="GDP per capita and Growth Rate for Canada")
In [51]:
gdppcp.show()

Conclusion

Our aim of the project was to know how COVID-19 has impacted Canada (Bird's eye view), Alberta (The province which concerns us the most), and the economic impact of COVID-19 on Alberta's consumers and producers.

To measure the economic health of Canada, we look at the fluctuation in its GDP. The biggest and most important sector (export-wise) in Canada as a whole is Oil & Gas and Mining sector. Therefore, we also look at oil price fluctuation as well since high oil prices result in a growing economy for Canada and vice-versa.

We observed that during COVID-19, The GDP of Canada took a big hit due to low oil prices and reduced demand for goods that are mainly exported by Canada. We can also see that after 2020, (post-pandemic) there has been a huge rebound which is a positive indicator signaling a rapid recovery.

We also measured the correlation of Canada's GDP with Oil Prices to know how hugely oil prices affect the GDP of Canada. We found a positive correlation of 0.716! which indicates that the Canadian economy is very sensitive to global oil prices and high oil prices are good for the Canadian economy.

When it comes to Alberta, we wanted to know how the consumers in Alberta province and major producers in Alberta were affected due to COVID-19. The biggest industry in Alberta is Oil and Gas, therefore we take a look in great detail at how the Oil & Gas industry of Alberta was affected.

We found out that during pre-pandemic times, the gross revenue earned by Oil & Gas was increasing steadily at a slow pace; during the pandemic, we saw a huge dip in gross revenue (huge losses) due to less oil demand and post pandemic; we see that the industry has made a roaring comeback with record high profits, never seen before.

Till now, we have talked about how Canada and Alberta, in general, were affected, but now we will talk about how the common man and the general public of Alberta were affected due to the pandemic. To know that, we looked at CPI in the post-pandemic era and the unemployment rate. We found that the cost of living for most basic things like Energy, Housing, Food & Alcohol has risen significantly post-pandemic. This creates a lot of hardship for those Albertans who were living paycheck to paycheck before the pandemic struck. We also observed a huge spike in the unemployment rate during the COVID-19 Pandemic (35% of people being unemployed at one point in time!), This rate has come down in the year 2022, but still, the unemployment rate is a little bit higher when compared with the pre-pandemic era, which is concerning since it shows that new job postings have not been created fast enough for the rest of the people.

We also observed the response of the Canadian government and central banks during the pandemic. These institutions tried their best to stop a deep recession/depression during turbulent times. We observed that the banks started giving out loans/credit to other people during the pandemic at very cheap/low-interest rates to create demand and hold the economy steady. the interest rates have again gone up since the times are now better and the economy is in the recovery phase.

In the end, we show that during the pandemic, the GDP of Canada contracted by around 12.5 % and during post pandemic, the GDP grew by 20%, which is a good sign. All in all, we see that even though the economy has largely recovered, there is still some inertia in the system regarding unemployment rates and the general well-being of the Canadian people. We expect these conditions to improve as time furtherĀ movesĀ on.